Maximum Entropy Tagging with Binary and Real-Valued Features
نویسندگان
چکیده
Recent literature on text-tagging reported successful results by applying Maximum Entropy (ME) models. In general, ME taggers rely on carefully selected binary features, which try to capture discriminant information from the training data. This paper introduces a standard setting of binary features, inspired by the literature on named-entity recognition and text chunking, and derives corresponding realvalued features based on smoothed logprobabilities. The resulting ME models have orders of magnitude fewer parameters. Effective use of training data to estimate features and parameters is achieved by integrating a leaving-one-out method into the standard ME training algorithm. Experimental results on two tagging tasks show statistically significant performance gains after augmenting standard binaryfeature models with real-valued features.
منابع مشابه
Learning Structured Information in Natural Language Applications
Recent literature on text-tagging reported successful results by applying Maximum Entropy (ME) models. In general, ME taggers rely on carefully selected binary features, which try to capture discriminant information from the training data. This paper introduces a standard setting of binary features, inspired by the literature on named-entity recognition and text chunking, and derives correspond...
متن کاملTagging Unknown Words with Raw Text Features
Processing unknown words is disproportionately important because of their high information content. It is crucial in domains with specialist vocabularies where relevant training material is scarce, for example: biological text. Unknown word processing often begins with Part of Speech (POS) tagging, where accuracy is typically 10% worse than on known words. We demonstrate that features extracted...
متن کاملUsing PCA with LVQ, RBF, MLP, SOM and Continuous Wavelet Transform for Fault Diagnosis of Gearboxes
A new method based on principal component analysis (PCA) and artificial neural networks (ANN) is proposed for fault diagnosis of gearboxes. Firstly the six different base wavelets are considered, in which three are from real valued and other three from complex valued. Two wavelet selection criteria Maximum Energy to Shannon Entropy ratio and Maximum Relative Wavelet Energy are used and compared...
متن کاملA Weighted Maximum Entropy Language Model for Text Classification
The Maximum entropy (ME) approach has been extensively used in various Natural Language Processing tasks, such as language modeling, partof-speech tagging, text classification and text segmentation. Previous work in text classification was conducted using maximum entropy modeling with binary-valued features or counts of feature words. In this work, we present a method for applying Maximum Entro...
متن کاملNING MA et al: FUSION OF WORD CLUSTERING FEATURES FOR TIBETAN PART OF SPEECH TAGGING
Tibetan Part of Speech (POS) tagging, the foundation of Tibetan natural language processing, judges word classification according to contextual information of words. Based on the framework of the maximum entropy model, the paper studied the fusion of morphological features for Tibetan part of speech with maximum entropy model with the integration of word clustering features. Experimental result...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006